How the diffusivity profile reduces the arbitrariness of protein folding free 

energies 

M. Hinczewski^'^, Y. von Hansen^ J. Dzubiella\ R. R. Netz^* 
^Physics Department, Technical University of Munich, 85748 Garching, Germany 

^Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742 

O 

^H *To whom correspondence should be addressed; E-mail: netz@ph.tum.de 

^ Abstract 

^ The concept of a protein diffusing in its free energy folding landscape has been fruitful for both theory 

and experiment. Yet the choice of the reaction coordinate (RC) introduces an undesirable degree of arbi- 
trariness into the problem. We analyze extensive simulation data of an ct-helix in explicit water solvent as 
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^ it stochastically folds and unfolds. The free energy profiles for different RCs exhibit significant variation, 

JL some having an activation barrier, others not. We show that this variation has little effect on the predicted 

r^ folding kinetics if the diffusivity profiles are properly taken into account. This kinetic quasi-universality 

O is rationalized by an RC rescaling, which, due to the reparameterization invariance of the Fokker-Planck 

^ equation, allows the combination of free energy and diffusivity effects into a single function, the reseated 

Qh free energy profile. This reseated free energy indeed shows less variation among different RCs than the 

,__! bare free energy and diffusivity profiles separately do, if we properly distinguish between RCs that contain 

> 

(^ knowledge of the native state and those that are purely geometric in nature. Our method for extracting diffu- 
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04 sivity profiles is easily applied to experimental single molecule time series data and might help to reconcile 
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conflicts that arise when comparing results from different experimental probes for the same protein. 



I. INTRODUCTION 

The problem of protein folding kinetics is formidable from a purely statistical mechanics point 
of view: The unfolded protein, in other words the entire ensemble of micro-states that significantly 
deviate from the native state, transits via a myriad of distinct pathways to the folded (native) state, 
and trying to predict folding times from basic principles is obviously hopeless. Yet, robust features 
have emerged both from experiments and theoretical concepts [HI El. A key fact is that any experi- 
ment that probes protein folding or unfolding projects protein micro-states onto a low-dimensional 
(typically one-dimensional) observable. For example, circular dichroism in the far ultraviolet and 
infrared adsorption spectroscopy basically measure the average helicity, while fluorescence is sen- 
sitive to side chain contacts or local solvent structure around tryptophan residues [[31 IH. Kinetic 
information at ambient conditions and on short time scales relevant for fast folding events can 
be obtained by time-resolved spectroscopy after flash photoheating [15J| or by FRET and TTET 
correlation studies that couple to the distance between a donor and acceptor linked to two posi- 
tions along the peptide chain [[6113. More recently, single-molecule spectroscopic techniques have 
allowed the observation of time-dependent folding/unfolding of individual proteins, thus going be- 
yond ensemble averaging [[HI HI. Likewise, single molecule studies where forces are applied at two 
points along the peptide backbone probe the distance between those two anchoring points [[TOl . 
All these experimental observables in fact constitute distinct reaction coordinates (RCs). 

Exponential distributions of folding times found for many (but not all) proteins using differ- 
ent techniques suggest two- state-folding as a quite general paradigm of folding kinetics: here the 
folded and unfolded states are separated by a free energy barrier along the respective RC [[H. 
Even proteins folding via many intermediate states can produce a single exponential folding time 
if there exists a rate-limiting transition. Therefore, as long as the reaction coordinate of choice 
distinguishes the two states connected by the rate-limiting step, using different kinds of measure- 
ment/reaction coordinate would likely generate similar single-exponential kinetics even in such a 
case. Similar conclusions can been drawn from the direct observation of population distributions, 
where a free energy barrier means that folding intermediates are rarely observed [|8l|9l. The re- 
cent observation that different experimental techniques yield different kinetics [[TT| or distribution 
functions [[T2l [T3ll when applied to the same protein casts doubt on the clear division between 
two-state (exhibiting a free-energy barrier) and down-hill folders (without such a barrier). In this 
paper we argue that such inconsistencies can arise when implicitly referring to different RCs, and 



show a way of how to reconcile conflicting results. 

In theoretical studies, various RCs have become popular to characterize the folding transition, 
either because they approximately correspond to an experimentally accessible observable or be- 
cause they are simple to calculate. The radius of gyration, the fraction of native contacts between 
residues, or the mean distance from the native state are typical examples f[T4l[T5l . More complex 
topological order parameters such as the contact order have been suggested for describing univer- 
sal features of protein folding kinetics [|T6ll . In the theoretical framework that naturally emerges, 
the protein diffuses along the RC, governed by a stochastic equation and subject to deterministic 
forces encripted in the free energy landscape, as well as stochastic forces due to the random envi- 
ronment [fT7] - fT9l . Early on, it was realized that the diffusion constant in this coarse-grained picture 
is an effective quantity that takes into account the connectivity between states (i.e. the number of 
possible connecting paths), the energetic ruggedness of such paths [|20l , as well as orthogonal de- 
grees of freedom [[211 . As folding progresses, internal friction starts to play a more dominating 
role [[22II23I . while solvent friction becomes less important as more and more peptide groups lose 
solvent contact Q. Recently, the simplification of a constant diffusivity was abandoned and a 
diffusivity profile was extracted from simulations of peptides: these works either considered pro- 
teins without solvent (and thus exclude variations of the solvent friction) [|24l - l26l or considered 
exclusively short-time dynamics and thus are not applicable to global folding kinetics [|27ll . The 
trifold coupling between the choice of a specific RC and the free energy and diffusivity profiles in 
the presence of explicit solvent has remained elusive. 

In this paper we perform an in-depth analysis of long MD trajectories of an «-helix forming 
oligo-peptide including explicit water. Such model peptides form the subject of detailed experi- 
mental studies and constitute some of the simplest peptides that exhibit non-trivial folding kinet- 
ics ll28l . They are thus interesting in their own right and at the same time — due to their minute 
size — allow for realistic modelling over times much longer than their folding times, including sol- 
vent degrees of freedom | !29l . As a prerequisite for our analysis, we introduce a simple way of 
extracting diffusivity profiles from time series data for an arbitrary RC, that can be conveniently 
applied to experimental spectroscopic data AH, or force spectroscopic data for RNA [30], or pro- 
teins [31 1 as well. We demonstrate that different RCs for one and the same protein trajectory are 
associated with substantially different free energy profiles, some showing a barrier separating the 
folded and unfolded helix state, some showing no barrier at all (which is not surprising and has 
been found in different contexts before [|32l ). This resembles the experimental findings in connec- 



tion with the dispute on down-hill versus two-state folding [fT2l[T3l . but is resolved by accounting 
for the spatially inhomogeneous diffusivity: The diffusivity profiles are full of structure and show 
considerable variation among different RCs. No simple connection between the free energy and 
diffusivity profiles seems to exist. Yet, the folding kinetics predicted using a stochastic approach 
based on the free energy landscape is largely independent of the RC if and only if the diffusivity 
profile is taken into account. Thus, the variance between free energy profiles along different RCs 
gives rise to kinetic universahty if the coupling to diffusivity is included (where we distinguish 
between reaction coordinates that contain knowledge of the native state and those that are purely 
geometric in nature). This specifically means that the presence of a free energy barrier (i.e. ab- 
sence of intermediate states) is in principle compatible with both exponential and non-exponential 
kinetics, and that different experimental probes are bound to measure different free energy profiles. 
The same conclusions also apply to more refined or optimized RCs [|33] - [37l . Full understanding of 
protein folding kinetics thus requires measuring both average distributions and kinetic trajectories. 
Similar conclusions were very recently drawn from a Bayesian analysis of folding trajectories of 
simple coarse-grained model peptides based on implicit- solvent simulations [|26ll . Since a-helices 
are a prominent folding motif, the features we find are most likely relevant for more complex 
proteins as well. 

II. METHODS 

Simulations - Standard all-atom MD simulations provide 1.1 /xs trajectories of an alanine (A)- 
based peptide with sequence Ace-AEAAAKEAAAKA-Nme in explicit water [29], which is a 
shortened version of similar sequences with charged Glu+ (E) and Lys^ (K) residues at positions i 
and i + 4 that experimentally are known to spontaneously form «-helices [|28l . The mechanism for 
a-helix formation involves, in addition to the stabilizing influence of E-K salt bridges, hydration 
effects [|29l l38l . The MD simulations utilize the parallel module sander.MPI in the Amber 9.0 
package with the ff03 force-field and the TIP3P water model at a pressure of 1 bar and a tempera- 
ture T fixed by a Berendsen barostat and Langevin thermostat, respectively [[391 . The periodically 
repeated cubic simulation box has an edge length L ^ 36A including ^ 1500 water molecules. 
Electrostatic interactions are calculated by particle mesh Ewald summation and real- space elec- 
trostatic and van der Waals interactions are cut off at 9 A. As a check on the convergence of the 
standard MD simulation, replica-exchange MD (REMD) simulations are performed with the AM- 



TABLE I. List of reaction coordinates (RCs) used in the paper. 

RC notation description 

Qi RMS deviation from perfect helix 

Q2 native intra-backbone hydrogen bond length 
Qs inverse native hydrogen bond length 

Q4 radius of gyration 

Q5 end-to-end distance 



BERIO simulation package [|39l . Here the same force-field and system parameters as in the other 
standard MD simulations are employed, apart from switching to a constant volume ensemble. 32 
replicas are considered in a temperature range between 265 and 520 K, with each replica simu- 
lated for 22.5 ns, amounting to a total sampling time of 720 ns. Temperature exchanges between 
neighboring replicas are attempted every 250 integration steps, leading to an exchange rate of 10 - 
30%. 

Reaction coordinates - Trajectory analysis is performed using the ptraj tool in the Amber pack- 
age. [39] The helicity (i.e., the a-helical fraction) is identified using the DSSP method by Kabsch 
and Sander ll40ll . In addition, we focus on five different RCs to follow the folding kinetics: 

(i) Qi, defined as the root-mean- square distance from a fully helical reference structure, aver- 
aged over all M atoms of the peptide. The reference structure was chosen randomly from config- 
urations which display 100% helicity, with little variation depending on the specific choice. 

(ii) The mean native hydrogen bond (HB) length, Q2 = J2i=i '^i,i+A/{N — A), averaged over all 
A^ = 14 residues including the acetyl (Ace) and amine (Nme) end caps, where r^ ,, is the distance 
between HB forming atoms, i and j, in the peptide backbone. 

(iii) The mean inverse HB length, Qs = 1 — (A^ — 4)^^ Xli=i '^u+Al'^i,i+i^ where r°j^4 ^ 2 A 
is the native HB length in the folded state, defined by the most probable length of each (i, i + 4) 
HB. 

r -1 1/2 

iv) The radius of gyration, (^4 = X]j j=i ''^ij/i'^^'^) ' ^ measure for the average peptide 
size and accessible in scattering. 

v) Q5, the distance between the centres of mass of the end caps. Trajectories are recorded with 
a resolution of 20 ps, giving a total of 54171 data points. To compare different RCs with each 
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FIG. 1 . Complete time series data of the simulation run for the peptide in explicit water. Shown are helicity 
and the five considered RCs defined in Table I. Lines in black/blue show the full resolution data (20 ps), 
while red lines are smoothed over time windows of 2 ns. The right panels show selected data windows at 
higher time resolution for qi and q^ together with a few selected MD snapshots of the peptide backbone 
structure. 



other, we exclude for each RC the 1 1 smallest and 1 1 largest values, and define rescaled RCs 



q^ = {Q^-Qr)/iQ'^ 
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such that the minimal and maximal values of the remaining 54149 data points, denoted as Q 
and Q™", are projected on the RC values g^ = and qi = 1, respectively. 

Diffusion constant - We assume that the stochastic time evolution of a given RC is described by 
the one-dimensional Fokker-Planck (FP) equation BTI 
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|*(Q,«)^^Z)(Q)e-«)|,*(Q.«)e-«) 



where ^(Q, t) is the probability of having a configuration with RC value Q at time t, D{Q) is 
the (in general Q-dependent) diffusivity, (5 = l/{kBT) and /3F(Q) = -ln(\l>((5)) is the free 
energy where (\1'(Q)) is the time-averaged probability distribution. A few methods to extract 
D{Q) from time-series data based on Bayesian analysis of transition rates [|34l |42| or short-time 
fluctuations have been described [|25ll27l . Our method extracts D{Q) directly from folding times. 
Define r^viQ-, Q^) as the mean first passage (MFP) time to go from a state Q to some final state 
Q^ without recrossing Q^, corresponding to an adsorbing boundary condition at Q*. For the case 



Q>Qf one finds [|43l 



rMQ,Qn= I dQ'—— I rfg-e-'^W) (3) 

jQt ^W ) JQ' 

and for Q < Q^ one has 

rMQ,Qn= dQ''—— rfg'V'^W), (4) 

where at Q™" and Q™^" reflective (zero-flux) boundary conditions hold. By differentiation with 
respect to Q, we obtain the diffusivity for Q > Qf 
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and for Q < Q^ as, 

D{Q) = - . "niMrin dQ'e-^^(^'\ (6) 

OTFp{Q,Qt)/dQ jQn.n 

An even simpler procedure employs the round-trip time 

rRj{Q,Q^) = sign(g - Qf)[MQ,Q^) + MQ^,Q)], (7) 

the magnitude of which is the time needed to start at Q, reach Q^ for the first time, start from Qf 
again and reach back to Q for the first time. One finds 



rKr{Q,Q^) = Z [ dQ' 

JQf 



Q QfiFiQ') 



D{Q') 



(8) 



where Z = J „,^„, dQe^'^ ^^' is the partition function. The diffusivity profile based on the round- 
trip time reads 

^^^^ = dT^r{Q,Qf)/dQ- ^^^ 

Intuitively, the slope of the round-trip time function is inversely proportional to D{Q): For a given 
F(Q), a larger slope implies a slower return to the starting point, or equivalently a smaller local 
diffusivity. The FP approach assumes an underlying Markovian process, meaning that D(Q) and 
thus dTRT{Q, Q^)/dQ are independent of Q^ . We exploit (and check) this by defining a mean 



round-trip time function fRT{Q) that results from an average of round-trip times TjcriQ, Q^) over 
their final states Qf . Since on the FP level rRT(Q, Q^) curves for different Q^ differ only by an 
additive constant, we should be able to collapse all such curves onto trt((5). The assumption of 
Markovian behavior breaks down at short times and for unsuitable reaction coordinates (i.e. RCs 
that do not single out the transition state, as will be explained in detail later on) and is clearly 
indicated by deviations of the round-trip time functions for varying Q^ , trj{Q,Q^), from the 
mean 7rt(Q)- Insight into this can be gained with a simpler definition of the diffusivity based on 
the variance in RC space [|27]| 

D^UQo, St) = {{Q{6t, Qo) - {Q{6t, QoW)/{26t) (10) 

where Q{St, Qo) denotes one specific realization of a path that starts at Qo at time St = 0. As 
we will demonstrate, -Dvar(Qo) St) sensitively depends on the lag time 5t. To get accurate results, 
St should be small enough that the region explored by the RC in this time interval has an ap- 
proximately constant free energy; however if St is below a threshold time scale, the resulting Z^vai- 
may be dominated by non-Markovian properties. We will mostly use the round-trip method for 
determining D{Q), but compare to the other methods as well. 

In our analysis of the simulation time series data we discretize RCs in typically K = 50 inter- 
vals and normalize probability distributions according to J2k=i ^iQ^^\t) = K. 

Fit of round-trip times - To extract D(Q) from the simulation data requires estimating the deriva- 
tive dfRT{Q)/dQ. We start by fitting a smooth function to the numerical results, exploiting the fact 
that TYuiQ) should be a monotonically increasing function of Q. Thus the fitting function f^ufiiiQ) 
can be expressed in the form: 

trtAQ) = rRT,fit(Q"") + / rfQ'e^(Q'), (11) 

where W{Q') is an arbitrary function. We expand out W{Q') in a basis of cubic B-splines defined 
over the range Q™'" to Q™", and use the coefficients of the expansion as fitting parameters. The 
size of the basis is fixed at 40 splines. The full expression for f^jfu^Q) is fit to the simulation 
estimate for trj^Q) using a standard least squares technique, with one modification: the quantity 
to be minimized is the sum of squared residuals plus another term which penalizes roughness in 
the fitted function. This additional term has the form A Jq,„,„ dQ' [dW {Q') / dQ') , with smoothing 
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FIG. 2. Mapping from RC qi to different RCs. Plotted is the mean distribution (^(9)) for the entire time 
series data in Fig. 1 and — in different colours — selected regions of the distribution. 



parameter A. Larger values of A lead to progressively smoother fits to the data. The entire fitting 
procedure is implemented through the Functional Data Analysis package in the R programming 
language [|44l . For all the results shown below we set A = 50, since we found that varying A 
in the range 10-200 had minimal effect on the resulting diffusion profiles. The range A ^ 10 is 
unsuitable because we fit to jagged features in the simulation trj^Q) curve which are the result of 
statistical noise. For the range A ^ 200, we over-smooth the curve, losing most of the local slope 
information and resulting in poor fits to the round-trip function. 

Reparameterization - As is well-known [|45l l46l . the FP Eq. (|2]) is invariant under an arbitrary 
RC rescaling according to Q = Q{Q) if the functions \1/, F, D are simultaneously rescaled as 
^ = m/Q', F = F + 13-^ \nQ', and D = {Q'fD. Here, Q' = dQ{Q)/dQ is assumed positive. 
Thus an arbitrary diffusivity profile D(Q) can be obtained, while the kinetics on the FP level and 
the partition function Z stay invariant, as long as the folding free energy is adjusted accordingly. 



For the particular choice of a constant diffusivity, D = Dq, we get Q' = y Dq/D and thus 
F = F-{2l3)-Hn{D/Do). 



III. RESULTS 

Fig. 1 shows the complete times series data for the simulated oligopeptide. In all five RCs and 
in the helicity data frequent switching between the folded state (large helicity and small qi values) 
and the unfolded state is observed, meaning that the simulation is converged and allows drawing 
conclusions on the folding and unfolding kinetics (further evidence is provided by the excellent 
comparison between straight MD and replica-exchange simulations, as shown in Fig. 6). The fine 
resolution data (Fig. 1, right panel) in terms of the RMS -deviation from the fully helical state, RC 
gi, suggest that an intermediate state and two barriers are present. As the snapshots indicate, in the 
fully helical state (gi ^ 0.1) roughly three a-helical turns are stabilized by salt bridges between 
the Glu^-2 and Lys^-6 and the Glu"*"-? and Lys^-11 residues, respectively. In the intermediate 
state (gi ^ 0.4) only one of the two salt bridges stabilizes two turns, while in the unfolded state 
(gi > 0.7) no bridge is present. Note that the characteristic transition time for unfolding of one 
helical turn, i.e. for the transition from gi ^ 0.4 to gi ^ 0.7 in (d), is roughly 200 ps and thus 
about 100 times shorter than the corresponding unfolding time in Fig. 3(e). While a high degree 
of correlation between different RCs can be inferred from Fig. 1, there is no one-to-one mapping, 
e.g., gs in Fig. 1(c) shows pronounced fluctuations in intervals where gi stays virtually constant. 

This is already evident from the average distribution function (^(g)) shown in Fig. 2 as a 
function of all different RCs. While the distribution (vl'(gi)) in the leftmost panel as a function 
of gi shows three broad peaks (corresponding roughly to none, one and two intact salt bridges), 
clearly separated peaks are absent when (^) is shown as a function of g2, gs, q^ or gs. The reason 
is simple: states that are separated when, e.g., described by gi, are mixed when they are projected 
onto different RCs. This is demonstrated by the coloured regions in Fig. 2 that for gi correspond to 
pure states, i.e. narrow intervals of gi values. While for g2 and q^ the colored regions are smeared 
out but the ordering along the RC is preserved, for q^ and gs the ordering is lost. This points to a 
fundamental difference between the RCs gi, g2, gs, that embody knowledge of the native state, and 
the RCs g4, gs, which are purely geometric. 

In Fig. 3 we focus on RC gi. The free energy profile (3F{qi) = — ln(\l'(gi)) in a) reveals the 
intermediate state and two barriers at gi ^ 0.26 and gi ^ 0.48. Fig. 3(b) shows the roundtrip times 
TRT(gi,gi) for various final states q( as a function of gi, directly extracted from the simulation 
time series lISTIl . The data sets are shifted vertically (which according to Eq. (|9]) is irrelevant for 
extracting -D(gi)) to illustrate the predicted coUapse onto a single mean round-trip time function 
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FIG. 3. Results for RC qi (note that the upper scale is in terms of the unrescaled RC Qi). (a) Free 
energy profile /3F = — In(^). (b) Data points give the round-trip times TRT(gi, ql) as extracted from the 
simulation data for various final states q( that are denoted by vertical colored bars. The data is shifted 
vertically for each ql to illustrate the theoretically predicted collapse onto a single mean round-trip curve 
trt(qi), with the smooth fit TRj^tiQi) shown in blue. The red curve denotes the round-trip time from the 
Bayesian approach, (c) Diffusivity from the round-trip time method Eq. Q (blue curve), compared to the 
variance method Eq. ( 10 1 for lag times 6t =200 fs, 20 ps, and 200 ps (dash-dotted, dashed, dotted green 
curves), and to the Bayesian method (red curve) |42|. (d) MFP or folding time Tpp{qi, ql) for the final state 
ql = 0.11, as extracted directly from the simulation data (circles) and compared to predictions from Eq. (OJl 
using the different diffusivities shown in (d). (e) MFP or unfolding time for the final state g( = 0.57, same 
notation as in (d). Vertical dotted lines in (d) and (e) mark the final states q^ for folding and unfolding. 



trt(q'i)- The smooth fit rRT,fit(q'i) is shown as a blue curve. The collapse of Tja-{qi, ql) for different 
q( is a strong check on the consistency of the FP approach. The red curve denotes the round- 
trip time from the Bayesian approach [l42l . obtained for optimized time interval and smoothing 
parameters At = 6 ns and 7 = 0.2 ns~^ iBTI . Fig. 3(c) shows the diffusivity -D(gi) extracted from 
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TRTfitiqi) via Eq. ^ (blue curve). Most notably, -D(gi) varies considerably along qi. it is reduced 
by an order of magnitude around the intermediate state at gi ^ 0.32 and seems correlated with 
F{qi). The D{qi) profile from the Bayesian approach (red curve) reproduces the coarse features of 
our round-trip approach with slight difference that will be discussed below. We stress that we have 
fitted the two parameters in the Bayesian approach, namely the time interval and the smoothing 
parameter, by a comparison with the simulation mean-first passage times (see Supplement for 
further details IISTI ). The diffusivity profiles resulting from the Bayesian approach sensitively 
depend on these parameters, and without such a comparison it is not easy to see what are sensible 
parameter values. This highlights an advantage of our method based on the round-trip time, since 
the only parameter is a smoothing factor that operates directly on the round-trip time, a physical 
observable, and sensible parameter values are straightforwardly estimated. The variance method 



Eq. (10) for lag time 6t = 200 fs (upper green curve) overestimates D{qi) by two orders of 
magnitude, yet for 5t = 200 ps (lower green curve) Dyar approaches the results of the other two 
methods quite nicely. Thus for 5t < 200 ps, Dyar is dominated by non-Markovian events that 
are unrelated to the long-time folding/unfolding dynamics; interestingly, this threshold time is 
similar to the transition time for helix unwrapping inferred from Fig. 1(d). In Figs. 3(d) and (e), 
we show MFP times rFp(gi,g() for qi > qi = 0.11 (folding) and qi < ql = 0.57 (unfolding) 
calculated from Eq. ([3| and the various D{qi) profiles shown in (c). rFp(gi, q() directly extracted 
from simulation data (circles) in Fig. 3(d) is most accurately reproduced by the Bayesian fitting 
approach (red curve), as expected since the probability distribution and thus the frequency of 
transitions is maximal in the range gi ~ — 0.25 (see Fig. 2(a). The RT approach (blue curve) 
considers an equal balance of folding and unfolding events and consequently describes unfolding 
MFP times in Fig. 3(e) better. Noteworthy, the RT approach is simple to implement, directly works 
on the property one wishes to describe (namely folding/unfolding times) and has apart from the 
functional form of the fitted round-trip time fjuiqi) no freely adjustable parameter. The combined 
deviations between simulation data and Fokker-Planck predictions in Figs. 3(d,e) are due to a 
combination of non-Markovian processes at short times and insufficient trajectory sampling. 

In Fig. 4(a) we compare the diffusivities based on the round-trip time approach (blue curve) and 
the Bayesian approach (red curve), already presented in Fig. 3(c), with results obtained from the 
MFP times via Eq. (pj), shown as a green curve. For the fit we used a final state g( =0.11 and con- 



sidered folding events from qi > q{ to q{ . It is seen that the three curves roughly coincide, which 
testifies to the robustness of methods for deriving diffusivities from folding times. In Fig. 4(b) we 
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FIG. 4. Results for RC qi. (a) Diffusivity from round-trip time method Eq. (|9]) (blue curve) and the 
Bayesian approach (red curve); these are the same data already shown in Fig. 3(c). The green curve is based 
on the first passage time method and follows from Eq. (pjl for the final state q^ = 0.11. (b) Diffusivity from 
the round-trip time method Eq. Q (blue curve) compared to the variance method Eq. ( 10 1 for lag times 
6t = 200 fs, 20 ps, 200 ps, 2 ns and 10 ns (green curves, from top to bottom). 



compare diffusivities from the variance method, Eq. (10), to the round-trip time method Eq. ([9]) 
(blue curve). Here we present results for -Dvar(Q, St) for a wider range of lag times of 5t = 200 fs, 
20 ps, 200 ps, 2 ns and 10 ns (green curves, from top to bottom). It is seen that for lag times 
between 6t = 200 ps and 6t = 2 ns, Dyar{Q, 6t) agrees with the round-trip time approach. As 
already discussed, for smaller lag times Dyay{Q, 6t) is too large. For larger lag times Dy^^{Q, 6t) 
loses structure and becomes too small, which has to do with the fact that at those times the peptides 
explores a considerable subsection of the free energy space and the effect of the energetic barriers 
encountered are spuriously accounted for by a reduction of the diffusivity. The situation is similar 
to the Bayesian approach: there is no a-priori way of knowing what the suitable parameter value 
for the lag time is, unless one compares to a physical observable, which might be the folding or 
round-trip time. In that case, however, a direct fitting of D[q) based on folding times as suggested 
by us seems more direct and transparent. 

A free energy barrier, as exhibited by F(qi) in Fig. 3(a), was argued to arise from a subtle com- 
pensation of energy and entropy effects, which both increase upon unfolding [3J. This scenario, 
developed in the context of lattice models, is basically confirmed by our explicit water simulations. 
In Fig. 5(a), we show free energy profiles at different temperatures T from replica-exchange sim- 
ulations. Indeed, the entropic contribution TS, estimated from the free energy difference between 
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FIG. 5. (a) Replica-exchange MD results for the free energy profile /3F{qi) for different temperatures T, 
together with the entropic contribution TS obtained from the finite-T difference (with AT = 20 K) of 
(3F{qi). (b) Helicity and the number A^^at of backbone-bound water molecules vs. gi at T = 300i^. 
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FIG. 6. Comparison between replica-exchange MD results (red broken curve) and standard MD results 
(black solid curve) for the free energy profile /3F(gi) at T = 300i^. 



T =280K and 320K, shows considerable numerical error but rises across the unfolding transition. 
In Fig. 5(b) we show the number A^^at of backbone -bound water molecules that have a distance 
to a backbone oxygen smaller than 0.35nm. Apart from the loss of one bound water molecule at 
gi ~ 0.3 (paralleled by a helicity increase), A^wat steadily rises from about A^ = 20 in the folded 
state to A^ = 30 in the unfolded state. So we conclude that the entropy increase upon unfolding 
results from a competition of water binding and conformational effects. The overall good compar- 
ison between the free energy profile from a standard MD simulation run (for a length of 1.1 /xs) and 
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FIG. 7. Free energy profiles (top row), diffusivity profiles (middle row), and folding MFP times (bottom 
row) for all five reaction coordinates. The columns denoted "Original" show results as a function of the 
original RCs qi, while in "Transformed" rescaled coordinates qi are used such that the diffusivity profiles 
are constant. The final states qj for the folding (marked by dotted vertical lines) are chosen such that they 
map onto a single value q^ separately for the qi,q2, qs and 54 , q^ groups. 



results from a replica exchange MD simulation (trajectory length 22.5 ns and equilibrated with 32 
replicas at different temperatures) at T=300K in Fig. 6 gives good evidence that the times series 
considered in our kinetic analysis is long enough. 

The appearance of a free energy barrier, as seen in F(qi) in Fig. 3(a), is often interpreted as 
equivalent to exponential kinetics, which is not necessarily true as we will now discuss. In fact, 
even the presence of a free energy barrier depends on the specific RC employed and thus is a much 
less robust feature than often assumed: In Fig. 7 we show the free energy F(qi) and diffusivity 
D{qi) profiles of all five RCs. We separate RCs that embody knowledge of the native state qi,q2, Q's 
and the unbiased RCs ^4,^5. In the columns "Original" we use the bare RCs q^ as defined in the 
Methods section, in the columns "Transformed" we use rescaled RCs qi such that the diffusivities 
are constant, D{qi) = Dq. Two features strike the eye: 

i) Most diffusivity profiles are full of structure and vary substantially along the reaction path; 
it immediately transpires that a description of the folding kinetics without consideration of the 
diffusivity profile can fail. 

ii) The profiles F(qi) and D{qi) vary considerably among different RCs. In fact, while F{qi) 
shows pronounced barriers and an intermediate state, the profiles F{q2) and F{q-i) are free of 
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barriers: We conclude that the presence of barriers depends on the RC chosen. Do the kinetics 
within an effective Fokker-Planck description also vary among RCs, possibly showing exponential 
for some and non-exponential behavior for other RCs? While the free energy profiles F(qi) as a 
function of the original RCs show large variations, the profiles F(qi) after the transformation are 
quite similar (this is most striking for the radius of gyration, q^, and the end-to-end radius, q^), 
and thus the kinetics as characterized by the MFP times Tpp{qi, ql) in the bottom row are very 
similar. This at first surprising result can be easily rationalized: the round-trip method is designed 
to optimally reproduce the complete set of round-trip times and thus the slowest conformational 
transitions in the system. The different diffusivities D{qi) and free energy profiles F{qi) together 
uniquely determine the folding times. Assuming that different RCs yield a comparable separation 
of states into the unfolded and folded basins, it follows that the folding times must be very similar. 
This in fact holds for the RCs gi, g2, q?, on the one hand and for the RCs ^4, q^ on the other hand. 
Since after the rescaling the entire kinetic information is contained in the free energy profile, 
those profiles must be quite similar. It follows that the presence of a free energy barrier does not 
necessarily imply exponential kinetics; for that statement to be true the free energy barrier must 
persist after a RC transformation that makes the diffusivity profile flat. Although there are still 
differences among the free energy profiles for qi,q2,q3 after the transformation, they are small 
enough that the kinetics are not particularly distinguished. 

To highlight the implications of these results, we now turn the argumentation around. Consider 
a general RC transformation 

g = g + c(Tanh[(g-g*)/rf]-l), (12) 

that is assumed to be a monotonic function which implies that d > —c. This rescaling corresponds 
to a local stretching / compression of the RC around q* and via the reparametrization properties 
of the Fokker-Planck equation also modifies the diffusivity and the free energy profiles. In Fig. 8 



we show three different rescaled F{qi) and -D(gi) profiles, all generated via Eq. ( 12) from the RC 
gi for which D(qi) is flat (shown in blue). Depending on the parameters q*,c,d we generate free 
energy profiles that either exhibit a more pronounced barrier (green curve), a reduced barrier (red 
curve), or a free energy profile where the position of the minimum is moved from the folded to the 
unfolded state (turquoise curve). We mention that by construction, the kinetics as characterized 
by the round trip or MFP time are invariant under this rescaling. What this figure demonstrates is 
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FIG. 8. Free energy (top) and diffusivity profiles (bottom) for different rescaled RCs qi . Starting from the 



RC exhibiting a flat diffusivity (shown in blue), we arbitrarily rescale qi according to Eq. ( 12 1 such as to 
increase the barrier (green), decrease the barrier (red) and to reolocate the stable minimum (turquoise). 
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FIG. 9. Test for the quality of reaction coordinate gi. (a) shows the complete trajectory, (b) shows the 
corresponding equilibrium distribution {ipiqi)) and the regions A (qi < 0.1) and B (gi > 0.33) marked in 
orange and blue, respectively. The complete trajectory contains 181 transitions between A and B (90 from 
A to B and 91 from B to A). Panel (c) shows the splitting probabilities (p^iqi) (orange) and 0^(gi) (blue) 
and the transition path probability P(TP|gi) (red). P{TP\qi) reaches the maximum value P{TP\qi) ?a 0.43 
for qi = ql ^ 0.23, denoted by a red circle in (c) and red lines in (a) and (b). 



that under a combined rescaling of F{q) and D{q) one can generate a bewildering variety of free 
energy curves which share the identical kinetics, meaning that the free energy profile without the 
diffusivity is not sufficient to even qualitatively predict protein folding kinetics. 
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Much of the discussion in the preceding sections and the usage of one-dimensional RCs pre- 
sumes that the reaction coordinates are "good" in the sense that i) the ensemble of transition states 
is assigned to a narrow region of RC values and ii) that the probability of finding a transition state 
in that region is maximal [|33l[34l . To make that notion more concrete, one introduces the splitting 
probabilities 0'^(g) and </)^(g) for each value of the RC, where 0^(g) is the probability to reach, 
starting from RC value g, region A before region B [34J. In the context of transition states, the 
regions A and B would denote regions corresponding to the folded and unfolded domains flanking 
the transition region. The splitting probabilities are normalized as 

0^(g) + 0^g) = l (13) 

since eventually any state will diffuse out towards the boundaries. For a trajectory that passes 
through state gi there are four choices, it can be trajectory starting in A and returning to A, starting 
in B and returning B, starting in A and ending up in B or starting in B and ending in A. The 
respective probabilities are normalized as 

P(A ^ A\q) + P(A -^ B\q) 

(14) 
+ P(B -^ A\q) + P(B -^ B\q) = 1. 

For non-ballistic stochastic motion, the transition path probability P(TP|g) = P(A — )■ B|g) + 
P(B — )■ A|g), i.e. the probability that the trajectory connect regions A and B, can be maximally 
1/2. A maximum close to 1/2 characterizes a good reaction coordinate, a significantly smaller 
number points to a bad reaction coordinate. In Fig. 9 we show a detailed reaction coordinate 
analysis for RC gi with a resolution of 25 bins in the range 0. 1 < gi < 0.33 and using the full time 
resolution of 20 ps. In (a) we show again the complete time series and in (b) the corresponding 
probability distribution. Region A for gi < 0.1 is the folded region, region B for gi > 0.33 is a 
region where one helical turn is unfolded. In (c) we show the splitting probabilities 0^(gi) and 
0^(gi) (orange and blue lines). The behavior is as expected, with the probabilities switching from 
zero to unity between the boundaries of the regions A and B, and a rather large slope in the region 
around gi ^ 0.25 — 0.30. The maximum of the transition path probability P(TP|g|;) ^ 0.43 
(shown as a red curve) at a position g| ^ 0.23 means that gi is quite close to a perfect reaction 
coordinate and that the Fokker-Planck analysis performed in this paper is appropriate for long 
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times on the order of folding and unfolding events. Note that g| ^ 0.23 is close to a minimum in 
the equilibrium distribution (^(gi)), see Fig.9b, at which position the free energy thus exhibits a 
maximum. This is coincidental, since as we have shown in Fig. 8, one can easily change the free 
energy profile by a reaction-coordinate rescaling, which however leaves the splitting probabilities 
and the transition path probabilities invariant. 

IV. CONCLUSIONS 

In the naive approach towards protein kinetics, folding times are deduced from the free energy 
profile F(Q) alone. As has been argued before, [|24] - l27l such an approach is unreliable since for 
the simplest non-trivial folder, namely a single short a-helix in explicit solvent, the diffusivity 
profile D{Q) varies substantially along the folding path. Our D(Q) variation comes out somewhat 
stronger than from similar simulations with implicit solvent, suggesting that explicit solvent fur- 
ther increases the importance of diffusivity inhomogeneities [l24l|. In fact, to match experimental 
folding times of simple alpha-helix forming oligo-peptides within solvent-implicit simulations, an 
overall correction factor to the time scales is typically applied P7ll48ll . A detailed microsopic jus- 
tification for this is lacking; on the contrary, it has been shown that in many cases explicit solvent 
strongly influences the free energy landscape and introduces novel kinetic mechanisms that are 
completely absent in solvent-implicit simulations [|49l [50ll . When extending the analysis to five 
different popular reaction coordinates, we find free energy and diffusivity profiles to vary substan- 
tially among different RC representations. Yet, the kinetics that follows from a Fokker-Planck 
description is largely independent of the RC chosen, if and only if D(Q) is properly accounted 
for. A similar conclusion was reached recently based on coarse-grained, solvent-implicit simula- 
tions [|26l . This means that a quasi-universal (i.e. RC independent) description of protein folding 
kinetics necessarily involves D{Q). For this quasi-universality to hold we have to distinguish be- 
tween reaction coordinates that are based on the distance to the native state (such as Qi, Q2, Qs) 
and those that are purely geometric in nature (such as (^4, Qs). By considering generalized RCs 
and using the reparametrization invariance of the Fokker-Planck equation, we can design arbitrary 
F{Q) profiles with no barrier at all, an enhanced barrier, or an interchange of the naive stable and 
unstable states. This means that the concept of a free energy profile is to some degree arbitrary, 
which might be relevant with regards to recent discussions in the experimental literature [|TTl - fT3l . 
The kinetics, embodied in the folding time, and dependent on F[Q) and D[Q), is less arbitrary. 
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Our simulations are for a single a-helix fragment, one of the shortest oligopeptides which 
shows non-trivial folding. There is no reason to believe that for larger proteins the situation will 
simplify; we therefore argue that the diffusivity profile will be full of features and thus important in 
those more complicated situations as well. Our conclusions also apply to optimized or otherwise 
carefully selected RCs [f33] - l37l . since the reparametrization can be done for any RC and thus ar- 
bitrarily create, annihilate and shift barriers in the folding landscape (incidentally, RC gi turns out 
to be a quite good reaction coordinate according to the definition of Ref. Il34ll . as shown in Fig. 9). 
Our method of extracting the diffusivity profile via the mean-first-passage or round-trip time for- 
malism can be easily applied to time series data from FRET or force-spectroscopic experiments, 
so an experimental test of our results is possible. 
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Supplement to: "How the diffusivity profile reduces the arbitrariness of 

protein folding free energies" 

M. Hinczewski, Y. von Hansen, J. Dzubiella, R. R. Netz 

5.1. MAPPING BETWEEN REACTION COORDINATES 

Fig. 1 shows in the columns the average distribution function (\I/(gj)) as a function of all five 
different RCs considered in the main text. In each row the coloured regions denote identical sub- 
sets of states and are chosen to correspond to pure states for one reaction coordinate. While among 
the RCs gi, ^2, Q's and among the RCs q^, q^ the ordering of the coloured regions is preserved, this 
ordering is lost between those two groups. This points to a fundamental difference between the 
RCs qi,q2,q3, that embody knowledge of the native state, and the RCs ^4, gs, which are purely 
geometric. 

5.2. EXTRACTING THE DIFFUSIVITY PROFILE 

In Fig. 2 we show the free energy profiles, the round-trip times and the diffusivity profiles 
of all five reaction coordinates. In Fig. 3 we show the mean-first passage times for folding and 
unfolding events for all reaction coordinates, as extracted from the fitted diffusivity profiles and 
the Fokker-Planck description. The final states qf were chosen such that ^ 20% of the probability 
distribution (\E'(gj)) is contained in the range < g^ < qj (folding), or qj < qt < 1 (unfolding). 
The noise and non-monotonicity in the rpp curves extracted from the simulation data are due to 
the statistical effects of insufficient trajectory sampling (particularly at the edges of the free energy 
landscape) and time discretization. 

5.3. DETERMINING DIFFUSIVITY PROFILES BY BAYESIAN INFERENCE 

We briefly review the optimization method introduced in Ref. [[ll, and used previously to extract 
diffusivity profiles for protein folding dynamics in imphcit solvent ^. 
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FIG. S.l. Mapping between different reaction coordinates. Columns show the density distribution (^(g)) 
plotted in terms of the different reaction coordinates qi, q2, Qs, q^, and q^. In each row selected regions of 
the distribution are shown in color. 



A. Master equation approach 

When discretized in reaction coordinate space, the FP equation takes the form of a master 
equation ||3l 



dt 



i?,,,_i^,_i(t) + i?,,i+i^i+i(t) - Ri,i'^iit), 



(S.l) 



where the probability of being in bin i is denoted by ^i{t) = '^{Q^^\t)AQ, the bin width is 
AQ, the bin index i ranges from 1 to M, and the transition rate from bin j to bin i is Rij. The 
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FIG. S.2. The columns give results for all different RCs considered, a) Free energy profile PF{qi) = 
— ln{^{qi)). b) Data points give the round-trip times Tsj{qi,qj) as extracted from the simulation data for 
various final states qj that are denoted by vertical colored bars, c) Same data shifted vertically to illustrate 
the approximate collapse onto a single mean round-trip function ffa:{qi) for all qj , with the smooth fit 
function TYtjfitiQi) shown as a blue curve. The red curve denotes the round-trip time from the Bayesian 
approach, d) Diffusivity from the round-trip time method (blue curve), compared to the Bayesian method 
(red curve). 



rates fulfill detailed balance, i.e. Rij (\I/j) = Rj^i {'^i), where the equilibrium probability of 
each bin i is denoted by (\&j); the loss in bin i is caused by transitions to neighboring bins, i.e. 
Ri,i = — J2jj^i ^j,i- The rates in the master equation S.l are related to the free energy F(Q) and 
the diffusivity profile D{Q) in the FP equation via: 



F{Q 



(^)^ 



-^B^'log 



AQ 



(S.2) 



A+i/2 ~ (AQfR^^.+i^ 






(S.3) 
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FIG. S.3. The columns give results for all different RCs considered. Upper row: MFP or folding time 
rFp((?i, ql) for different final states qi, as extracted directly from the simulation data (circles) and compared 
to Fokker-Planck predictions using the diffusivities from the round-trip time approach (blue curves) and the 
Bayesian approach (red curves). The optimized parameters of the Bayesian approach are 7 = 0.2/ns and 
At = 6 ns for qi and 52 and At = 2 ns for 53, (74 and q^. Lower row: MFP or unfolding time, same notation 
as in upper row. The vertical dotted lines in both rows mark the final states q^ for folding and unfolding. 



with Di+i/a = {DiQ'^^) + L'(Q(^+^)))/2 being the diffusivity between the bins. For M bins the 
system is consequently characterized by 2M — 1 independent parameters: M — 1 rates -Rj.j+i for 
transitions from the neighboring bin on the right hand side and M equilibrium probabilities (\&j). 



B. Bayesian analysis of trajectories 



In a system described by Eq. S . 1 the conditional probability of landing in bin i in time At given 
a start in bin j is: 

p(z|j;At)= (exp(At'^))^ , (S.4) 

where R is the matrix with entries Rij. In our case it is tridiagonal and the transition probabil- 
ities are easily obtained numerically by diagonalization of the symmetrized matrix it defined by 
the entries Ri^j = Rij ((\&j) / (^j)) EEl. For a process described by Eq. 



S.l 



the likelihood 



of observing a certain sequence { Q *-*"-* (ta) }a=o ^^^^ ^ transitions at equidistant time intervals At 
is: 

L= \\ pfi„lu_i;t„-t„_i)= I I pfi|?;At)'"^^ (S.5) 



I I Pv^aPa— li ''« T^a—l, 

Q = l 
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where Nij is the total number of transitions from j to i observed along the trajectory and the time 
intervals ta — t^-i = At Va. Bayesian inference (BI) can be used to determine the underlying 
free energy F(Q) and diffusivity profile D(Q) from a stochastic trajectory. Bayes' theorem states 
that for a given trajectory (= data) the probability of certain parameters {F, D} to be correct is: 

^^ -^' ^ p(data) 

oc L • II exp (^ — J , (S.6) 

"-^^ V ' 

=P({F,D}) 



where L is the likelihood of Eq. S.5 , in our case the prior p({F, D}) just depends on the diffusivity 



profile D{Q), penalizing large deviations of the diffusivity at adjacent grid points. 

C. Optimization procedure 

A standard simulated annealing scheme is used to optimize the probability p{{F, D}|data) in 



Eq. |S.6| by iterative variation of the 2M — 1 parameters of the system. The quantity to be minimized 
is the "energy" E defined by: 

E = -^^-log(p({F,D})). (S.7) 

At each step the parameters {Ri^i^ijfl^^ and {Pi}fii are slightly perturbed giving rise to a new 
configuration with energy E'^™, which is always accepted for E^^"^ < E and accepted with prob- 
ability p^^'^ = exp (-(E"'=^ - E)/T) for E"'^^ > E; the "temperature" T of the system is subse- 
quently lowered until the optimized F(Q) and D{Q) are reached. Several independent simulated 
annealing runs are performed; variations in the results obtained in different runs allow drawing 
conclusions on the quality of the estimate and the suitability of the process for a FP type descrip- 
tion. 



D. Dependence of D{Q) on the time interval At and tiie smootiiing parameter 7 

The Bayesian optimization method is applied to the dynamics of the reaction coordinate Q 



qi. In Fig. |S.4| we compare results obtained for different time intervals At; in Fig. |S.5| results for 
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FIG. S.4. Free-energy F (a), diffusivity D (b) and MFP times TFp(gi,(?() (panels (c) and (d)) for fixed 
value of 7 = 0.2/ns and different times intervals At used in the optimization procedure. The free-energy 
obtained from the equilibrium analysis of the trajectory is shown as a solid black curve in (a), the target 
states q{ = 0.11 and q{ = 0.57 are denoted as vertical dashed black hnes, and the values of rFp(gi, q{) 
extracted directly from the simulation data as black circles in (c) and (d). 



different values of the smoothing parameter 7 weighting the prior in Eq. S.6 are shown. 



We show results for 60 bins along the RC, and show average values of F{qi) and -D(gi) from 50 
independent optimization runs. Though being a fit quantity, the free energy profile F{qi) does not 
significantly differ from (\[^j) obtained from the equilibrium analysis of the trajectory (black lines 
in the upper panel of the figures). We note that the diffusivity profile D{qi) is strongly sensitive 
on the time interval At: while almost identical profiles like in the variance method analysis are 
obtained for At = 20 ps, the diffusivity subsequently decreases for larger At. The parameter 
7 can compensate insufficient sampling by externally requiring a smoothness of the diffusivity; 
however, strong external constraints corresponding to low 7-values tend to erase any structure in 
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FIG. S.5. Same as Fig. S.4 but showing the influence of the smoothing parameter 7 on the diffusivity profile 
D{qi) for fixed time interval At = 6 ns. 



Diqi). 

Reasonable choices of the parameters 7 and At are not evident a priori — to ensure that the 
long-time dynamics are correctly reproduced by the optimization result, we compute the position 
dependent MFP times rFp(gi, g() for a folded state (q( = 0.11) and an unfolded one (q( = 0.57) 
for each of the optimized diffusivity profiles and compare these curves to the one directly extracted 
from the simulation data. This comparison shows that in our case 7 = 0.2/ns and At = 6 ns are 
sensible values. 
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